-
Notifications
You must be signed in to change notification settings - Fork 28
feat: add initial extension support #395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
07c6293 to
95270ab
Compare
01172ec to
3e58a42
Compare
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments but still haven't gone through most of the PR. Just wanted to flush what I have so far, but I will review more later. Thanks!
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments, but excited to get this in! Sorry that this PR has been sitting so long 😅
Refactored ProtoContext trait (removed), by replacing with a concrete type. Some other small refactoring simplifications and code cleanup.
5fae289 to
d3e3627
Compare
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments. Had to run so I didn't get to entirely finish the review but will revisit later. Thanks
| /// Integer parameter (e.g., precision, scale) | ||
| Integer(i64), | ||
| /// Type parameter (nested type) | ||
| Type(ConcreteType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to point out that the spec technically allows for other type parameters:
message Parameter {
oneof parameter {
// Explicitly null/unspecified parameter, to select the default value (if
// any).
google.protobuf.Empty null = 1;
// Data type parameters, like the i32 in LIST<i32>.
Type data_type = 2;
// Value parameters, like the 10 in VARCHAR<10>.
bool boolean = 3;
int64 integer = 4;
string enum = 5;
string string = 6;
}
}That being said, I would prefer you leave what you have as is since the PR is fairly large already. Just a note for future work!
| /// Parameterized builtin types that require non-type parameters, e.g. numbers | ||
| /// or enum | ||
| #[derive(Clone, Debug, PartialEq)] | ||
| pub enum BuiltinParameterized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to separate out builtin parameterized types which don't take in type parameters from those that do? Not saying this is necessarily wrong, just curious to understand why. Substrait-java treats both e.g. decimal (which takes only an int param) and list as implementors of BaseParameterizedType.
| /// Sub-second precision digits | ||
| precision: i32, | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for these wonderful comments 🙏
| /// Unified representation of simple builtin types (primitive or parameterized). | ||
| /// Does not include container types like List, Map, or Struct. | ||
| #[derive(Clone, Debug, PartialEq)] | ||
| pub enum BuiltinKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to consistently use Type here? I.e. BuiltinType.
| /// Type parameters (e.g., for generic types) | ||
| pub parameters: Vec<TypeParam>, | ||
| /// Concrete structure definition, if any | ||
| pub structure: Option<ConcreteType>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note (haven't checked yet). It would be good to make sure there is a test specifically for structure.
This is kind of a rarer feature which in which we only recently introduced handling for substrait-go and is still a WIP for substrait-java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah there kind of implicitly is a test for this because of the extension_types.yaml file in the core extensions, which the registry.rs at least checks is able to load everything without an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, and doesn't necessarily have to be this PR, but what do you think about just making the Context and Parse traits pub(crate)? As I understand it, the only public API necessary for parsing are just the impls for ExtensionFile.
| self.simple_extensions | ||
| .get(anchor) | ||
| .ok_or(ContextError::UndefinedSimpleExtension(*anchor)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this function used anywhere yet. Wondering if it makes sense to just switch the return type to be Option<&Urn>? I could imagine contexts in which we might want to use this function but not consider the "not found" case an error.
| } | ||
|
|
||
| impl<C: Context> Parse<C> for proto::Version { | ||
| impl Parse<ExtensionAnchors> for proto::Version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Does parsing Version require ExtensionAnchors context, or any context at all? Could imagine either passing around () for empty context, or having two traits, one called something like Parse and another called ParseWith.
I'm just dumping some thoughts though :) this doesn't have to be in this PR, especially considering that Parse already existed.
| } | ||
| other => panic!("unexpected error type: {other:?}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked myself yet if it already exists but just noting that it would be good to have tests somewhere that ensure all of the core extension files parse correctly. Could be just as simple as showing they don't result in an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, you already got it in the registry file, nice!
Summary
Begins addressing #367 and #342 by adding support for parsing the types in YAML Simple Extension Files into Rustic types - with validity enforced. This includes a string text parser handling built-in types, compound types, named structs, custom types, and validated parameter constraints in the Simple Extension YAML files.
Scope
ExtensionFile,Registry,CustomType,ConcreteType) and enforces validation of those on creation / read.Key Changes
BuiltinType,CompoundType,ConcreteType,CustomTypewith Display/round‑trip support for alias and named‑struct structures.TryFrom<TypeParamDefsItem>,Parse<RawType>)ProtoContextfromContext, to distinguish between things needed for Protobuf parsing (ProtoContext)u!Name) and type variables; visits extension references for linkage bookkeeping.parsefeature includesserde_yaml;include!(extensions.in)is gated behindextensionsfeature.actions/checkoutto v4, updates Cargo dependency set, and bumps thesubstraitsubmodule.Compatibility Notes
ProtoContexton proto parsing that previously required onlyContext.extensions.innow compiled only withfeatures=["extensions"].Testing
features=["extensions"].